-
Notifications
You must be signed in to change notification settings - Fork 9
Add label join tests using fixtures #862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis update introduces new test utilities for loading CSV data and parsing timestamps, adds explicit test resources to the Changes
Sequence Diagram(s)sequenceDiagram
participant Test as LabelJoinV2Test
participant Utils as TestUtils
participant Table as TableTestUtils
participant Spark as SparkSession
Test->>Utils: createTableWithCsvData(...)
Utils->>Utils: createDataframeFromCsv(...)
Utils->>Spark: Read CSV, convert timestamp
Utils->>Table: insertPartitions(DataFrame)
Test->>Table: Setup tables for join
Test->>Test: Run label join with/without rounding flag
Test->>Spark: Assert DataFrame equality
Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (9)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (1)
830-838: Same issue for the round-down sessionAdd
import sparkRoundDownWithFixture.implicits._right after the session is created.
🧹 Nitpick comments (2)
spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (1)
679-707: Use Bazel Runfiles API instead of manualRUNFILES_DIRconcatHard-coding:
val runfilesDir = System.getenv("RUNFILES_DIR") Paths.get(runfilesDir, "chronon/...")works only when that env-var is present and the path layout stays constant.
Prefer:val runfiles = com.google.devtools.build.runfiles.Runfiles.create() val csvPath = runfiles.rlocation("chronon/spark/src/test/resources/local_data_csv/sample_join.csv")This survives all Bazel runfiles quirks and Windows paths.
spark/src/test/scala/ai/chronon/spark/test/TestUtils.scala (1)
454-473: CSV utilities – make parsing deterministic
inferSchema = truemakes tests flaky when the sample set changes.
Pass an explicitStructTypeor keepinferSchema = falsewith predefined column types.
unix_timestampuses the session timezone; tests may break on machines with a non-UTC TZ.
Considerto_utc_timestampor setspark.sql.session.timeZoneexplicitly inside the reader.When
tsColName != "ts"the original string column stays behind.
If not needed, drop it to avoid clutter:.withColumn("ts", (unix_timestamp(col(tsColName), tsFormat) * 1000).cast("long")) .drop(tsColName)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
spark/src/test/resources/local_data_csv/sample_join.csvis excluded by!**/*.csvspark/src/test/resources/local_data_csv/sample_label_loose_source.csvis excluded by!**/*.csvspark/src/test/resources/local_data_csv/sample_label_tight_source.csvis excluded by!**/*.csv
📒 Files selected for processing (3)
spark/BUILD.bazel(1 hunks)spark/src/test/scala/ai/chronon/spark/test/TestUtils.scala(2 hunks)spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: fetcher_tests
- GitHub Check: join_tests
- GitHub Check: groupby_tests
- GitHub Check: streaming_tests
- GitHub Check: analyzer_tests
- GitHub Check: spark_tests
- GitHub Check: batch_tests
🔇 Additional comments (2)
spark/src/test/scala/ai/chronon/spark/test/batch/LabelJoinV2Test.scala (1)
780-827: Schema mismatch risk in hard-coded expected DataFrames
expectedMay4Dfbuilds thetscolumn as a string, whereas the fixture loader casts it toLongType.
DataFrame.exceptrequires identical schemas and will fail iflabelComputedMay4.tsisLongType.
Cast explicitly to keep types aligned:-("1746266400000", +1746266400000L,or
.cast("long")after construction.spark/BUILD.bazel (1)
147-148: Good addition – resources now available tobatch_testIncluding
//spark/src/test/resources:test-resourcesindataensures the CSV fixtures land in runfiles. 👍
Summary
Checklist
Summary by CodeRabbit